GLBRC collaboration

Adina, Ashley, Shane, Nejc

  1. Sequencing and alignment Summaries
  2. Overarching analysis - changes in metagenome content over time
  3. Analysis of bins (MAGs)
    • Annotate them – who has which core functional genes (identified above)
    • How well do they overlap with our Core from the 16S paper? %coverage – Jackson will be the point person for that
    • Match 16S (but this is problematic because probably won't assemble well)
    • Use Udeobacter paper as a guide for MAG recovery analysis as compared to 16S – use the most resolved taxonomy –
    • Basic stats on MAGs – how abundant are those taxa in the dataset and how persistent are they over time? Where are the gaps, given the core
  4. metaT
    • Map both to metaG and to MAGs
    • Subset the core genes for this analysis?
    • Do we see the same patterns in metaG and metaT, over time?
    • Is everyone active? What proportion of the community? Only certain members? Can we "get away" with just metaG to understand who is active?

Metadata processing for the metagenomic samples

#HIDDEN

%matplotlib inline
from glob import glob # nbi:hide_in
from IPython.display import Image
from ipywidgets import Text, Dropdown, Output, interact
from matplotlib.collections import LineCollection
from matplotlib import pyplot as plt
from pandas import DataFrame, merge, read_csv, Series, to_datetime
from pickle import dump, load
from random import randint 
import cufflinks as cf
import numpy as np
import qgrid
cf.go_offline()
# cf.set_config_file(offline=False, world_readable=True)
cf.set_config_file(offline=True, world_readable=True)

# For metaG
metadataG = read_csv("metadata/GLBRC_MetaG_Metadata.tsv",sep='\t')
metadataG.set_index("nucleic_acid_name",inplace=True)
metadataG.drop(["source","sampling ID","sequencing_type","height_mean_cm","air_temp_c","rep","Sampling Time","reads_file_name","treatment"],axis=1,inplace=True) 
for id in metadataG.index: metadataG.loc[id,"type"] = metadataG[metadataG.index == id].plot_name[0][0:2]
metadataG['Date'] = to_datetime(metadataG.sampling_date)
metadataG=metadataG.sort_values(by=["type","Date"])
metadataG.drop(["sampling_date"],axis=1,inplace=True)
metaG_Reads = read_csv("mapping/metaG/hostRemovalFlagstats/multiqc_data/multiqc_bowtie2.txt",sep="\t")
for id in metaG_Reads.index: metaG_Reads.at[id,"Sample"] = metaG_Reads.at[id,"Sample"].replace(".stat","")
metaG_Reads.set_index("Sample",inplace=True)
metadataG = merge(metadataG,metaG_Reads,left_index=True,right_index=True)
metadataG.sort_values("Date",inplace=True)
metadataG["aligned"] = (metadataG['paired_total']-metadataG['paired_aligned_mate_none_halved'])
metadataG["percPlantAligned"]= metadataG["overall_alignment_rate"]
metadataG["Plant Reads"] = metadataG["aligned"]
metadataG["Total"] = metadataG["paired_total"]
metadataG.drop(['aligned','total_reads','paired_total','paired_aligned_none','paired_total','paired_aligned_one', 'paired_aligned_multi',
       'paired_aligned_discord_one', 'paired_aligned_mate_none', 'paired_aligned_mate_one', 'paired_aligned_mate_multi',
       'overall_alignment_rate', 'paired_aligned_mate_multi_halved', 'paired_aligned_mate_none_halved', 'paired_aligned_mate_one_halved'],axis=1,inplace=True)
metaG_Reads = read_csv("mapping/metaG/fungalRemovalFlagstats/multiqc_data/multiqc_bowtie2.txt",sep="\t")
for id in metaG_Reads.index: metaG_Reads.at[id,"Sample"] = metaG_Reads.at[id,"Sample"].replace(".fungal.stat","")
metaG_Reads.set_index("Sample",inplace=True)
metadataG = merge(metadataG,metaG_Reads,left_index=True,right_index=True)
metadataG.sort_values("Date",inplace=True)
metadataG["Fungal Reads"] = (metadataG['paired_total']-metadataG['paired_aligned_mate_none_halved'])
metadataG["percFungalAligned"]= metadataG["overall_alignment_rate"]
metadataG['percFungalAligned'] = Series([float("{0:.2f}%".format(val).replace("%","")) for val in metadataG['percFungalAligned']], index = metadataG.index)
metadataG['percPlantAligned']  = Series([float("{0:.2f}%".format(val).replace("%","")) for val in metadataG['percPlantAligned']], index = metadataG.index)
metadataG['Remaining'] = metadataG['paired_aligned_mate_none_halved']
metadataG['percRemaining'] = (metadataG["Remaining"]/metadataG['Total'])*100
metadataG.drop(['total_reads','paired_total','paired_aligned_none','paired_total','paired_aligned_one', 'paired_aligned_multi',
       'paired_aligned_discord_one', 'paired_aligned_mate_none', 'paired_aligned_mate_one', 'paired_aligned_mate_multi',
       'overall_alignment_rate', 'paired_aligned_mate_multi_halved', 'paired_aligned_mate_none_halved', 'paired_aligned_mate_one_halved'],axis=1,inplace=True)
G5Data = metadataG[metadataG['type'] == 'G5']
G6Data = metadataG[metadataG['type'] == 'G6']
# qgrid_widgetMG = qgrid.show_grid(metadataG)
# qgrid_widgetMG

Metadata processing for the metatranscriptomic samples

#HIDDEN
metadataT = read_csv("metadata/GLBRC_MetaT_Metadata.tsv",sep='\t')
metadataT.set_index("nucleic_acid_name",inplace=True)
metadataT.drop(["name", "Year","plotID","month","sampleID","day", "year", "time", "weather", "air_temp_c", "notes", "rep", "time_zone", "longitude", \
               "latitude", "altitude", "country", "soil_name", "number_cores", "SPNL_date", "pH", "lime_index", "P_ppm", "K_ppm", \
               "Ca_ppm", "Mg_ppm", "organic_matter", "NO3N_ppm", "NH4_ppm", "soil_moisture_percent", "soil_temp_10cm", "LDMC_mg_per_g", \
               "nitrogen_percent", "carbon_percent", "carbon_per_nitrogen", "height_mean_cm", "mass_per_leaf_g", "date_of_extraction",\
               "nucleic_acid_type", "replicate_extraction", "source", "source_mass", "extraction_method", "elution_vol_ul", "concentration_ng_per_ul",\
               "ratio_260_280", "conc_ng_per_g_source", "extracted_by", "sequence_name", "sequencing_date", "conc_sent_ng_per_ul", \
               "sequencing_type", "sequencing_facility", "primers", "submitted_for_sequencing", "sequencing_successful", "duplicate_submitted",\
               "dup_sequencing_name", "exclude_from_analysis", "itemID_JGI", "sampleID_JGI", "barcode", "JGI_library", "HPCC_path", "JGI_taxonOID",\
               "JGI_rawdataname", "time_numeric", "date", "precipitation", "Air_temp_mean", "air_temp_max", "Air_Temp_Min",\
               "Air_Pressure","plant_name", "RH", "AH", "Wind_Speed_Mean", "Wind_Direction_Mean", "Solar_Radiation", "PAR", "soil_temp_5_cm_bare_avg", \
               "soil_temp_5_cm_sod_avg"],axis=1,inplace=True) 
for id in metadataT.index: metadataT.loc[id,"type"] = metadataT[metadataT.index == id].plot_name[0][0:2]
metadataT['Date'] = to_datetime(metadataT.sampling_date)
metadataT = metadataT.sort_values(by=["type","Date","treatment","plot_name"])
metadataT.head()
#Reads After host removal
metaT_Reads = read_csv("mapping/metaT/hostRemovalFlagstats/multiqc_data/multiqc_bowtie2.txt",sep="\t")
for id in metaT_Reads.index: metaT_Reads.at[id,"Sample"] = metaT_Reads.at[id,"Sample"].replace(".stat","")
metaT_Reads.set_index("Sample",inplace=True)
metaT_Reads["percPlantAligned"] =  ((metaT_Reads['total_reads'] - metaT_Reads['paired_aligned_none']) / metaT_Reads['total_reads'])*100.0
metadataT = merge(metadataT,metaT_Reads,left_index=True,right_index=True)
metadataT.sort_values("Date",inplace=True)
# metadataT.head()

# qgrid_widgetMT = qgrid.show_grid(metadataG)
# qgrid_widgetMT

Sequencing and alignment Summaries

Methodology

Adapter Trimming and QC (Trimmomatic) Report

What is the abundance and quality of the reads in each sample?

Step 1. Split JGI files in PE files

split-paired-reads.py --gzip -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz \$jgi_combined_fastq

Step 2. Use trimmomatic to QC reads
trimmomatic PE -phred33 \$sample.fastq.pe1.gz \$sample.fastq.pe2.gz \$sample.fastq.pe1.gz \$sample.fastq.se1.gz \$sample.fastq.pe2.gz \$sample.fastq.se2.gz \
   ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:TRUE \
   LEADING:3 \
   TRAILING:3 \
   SLIDINGWINDOW:4:15 \
   MINLEN:36

[Outline](#home) Skip to: * Adapter Trimming and QC (Trimmomatic) Report * [Host Plant Alignment Report](#pAlign) * [Fungal Alignments](#swAlign) * [Alignment Conclusions](#aConc)

Link to the interactive metaT trim report.

Link to the interactive metaG trim report.

MetaG Trim Report

MetaT Trim Report

Sequencing and alignment Summaries

Host Plant Alignment Report

When the reads are aligned to plant reference genomes, how many of the reads are plant-aligned reads?

Step 1. Align reads to respective host reference assembly

if [[ \$sample == *"G5"* ]]; then
       SAMFILE=\$sample.SWGRASS.sam
       BAMFILE=\$sample.SWGRASS.bam
       bowtie2 --threads \$THREADS -x \$SWITCHGRASS
             -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz
             -U \$sample.fastq.se12.gz -S \$sample.SWGRASS.sam 2>\$sample.stat
else
       SAMFILE=\$sample.MISCANTS.sam
       BAMFILE=\$sample.MISCANTS.bam
       bowtie2 --threads \$THREADS -x \$MISCANTHUS
             -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz
             -U \$sample.fastq.se12.gz -S \$sample.MISCANTS.sam 2>\$sample.stat
fi

[Outline](#home) Skip to: * [Adapter Trimming and QC (Trimmomatic) Report](#seqSums) * Host Plant Alignment Report * [Fungal Alignments](#fAlign) * [Alignment Conclusions](#aConc)
#HIDDEN
# %matplotlib widget
from IPython.core.display import HTML
# disp = {"Percentage":{"Switchgrass & Miscanthus": "html/BothPerc.html", "Switchgrass": "html/SwitchPerc.html", "Miscanthus": "html/MiscanPerc.html"}}
# @interact
# def graph(Source=["MetaG","MetaT"],Host_Plant=["Switchgrass & Miscanthus","Switchgrass","Miscanthus"],By=["Percentage","Count"]): #return HTML(disp[By][Host_Plant])
#     if Source == "MetaG": metadata = metadataG
#     else: metadata = metaT_metadata
#     G5Data = metadata[metadata['type'] == 'G5']
#     G6Data = metadata[metadata['type'] == 'G6']
# #     htmlName = ""; fig = None
#     if By == "Percentage":
#         if (Host_Plant == "Switchgrass & Miscanthus"):
# #             htmlName = "html/BothPerc.html"
#             return metadata.groupby(['Date','plot_name']).sum()['percPlantAligned'].unstack().iplot(asFigure=True,title="% Reads Aligning to Respective Plant Host Assembly G5=Switchgrass G6=Miscanthus");
#         elif Host_Plant == "Switchgrass":
# #             htmlName = "html/SwitchPerc.html"
#             return G5Data.groupby(['Date','plot_name']).sum()['percPlantAligned'].unstack().iplot(asFigure=True,title="% Reads Aligning to Switchgrass Assembly",fontsize=12);
#         else: # "Miscanthus":
# #             htmlName = "html/MiscanPerc.html"
#             return G6Data.groupby(['Date','plot_name']).sum()['percPlantAligned'].unstack().iplot(asFigure=True,title="% Reads Aligning to Micanthus Assembly",fontsize=12);
# #         ylim = 100; ylab = "% Reads"
#     else:
#         colors = ["#74C476", "#f26d07","#794955"]
#         if (Host_Plant == "Switchgrass & Miscanthus"):
# #             htmlName = "html/BothCount.html"
#             return metadata.loc[:,['Plant Reads','Fungal Reads', 'Remaining']].iplot(asFigure=True,kind='bar', barmode = 'stack', color=colors, title="Reads Counts By Source")
#         elif Host_Plant == "Switchgrass":
# #             htmlName = "html/SwitchCount.html"
#             return G5Data.loc[:,['Plant Reads','Fungal Reads', 'Remaining']].iplot(asFigure=True,kind='bar', barmode = 'stack', color=colors, title="Switchgrass Reads Counts")
#         else:
# #             htmlName = "html/MiscanCount.html"
#             return G5Data.loc[:,['Plant Reads','Fungal Reads', 'Remaining']].iplot(asFigure=True,kind='bar', barmode = 'stack', color=colors, title="Miscanthus Reads Counts By Source")
# #     py.offline.plot(fig,filename=htmlName)
# #     return ax #HTML(htmlName)
HTML(filename="html/BothPerc.html")

Sequencing and alignment Summaries

Fungal Alignment Report

What reads are fungal reads?


Step 1. Extract reads that don't align the plant assembly

# A. Convert sam > bam
samtools view -bS \$SAMFILE > \$BAMFILE

# B. Extract all unmapped reads (-f 4) that don't have a mate mapped (-F 256 i.e. both unmapped)
samtools view -b -f 4 -F 256 \$BAMFILE > \$sample.unmapped.bam

# C. Sort the reads
samtools sort -n \$sample.unmapped.bam -o \$sample.unmapped_sorted.bam

# D. Split the unmapped reads in R1 & R2 (back into paired end)
bedtools bamtofastq -i \$sample.unmapped_sorted.bam -fq \$sample.R1.fastq -fq2 \$sample.R2.fastq

Step 2. Align the reads to the combined fungal assemblies
bowtie2 -x \$FUNGAL -1 \$sample.R1.fastq -2 \$sample.R2.fastq -S \$sample.fungal.sam 2>\$sample.fungal.stat

[Outline](#home) Skip to: * [Adapter Trimming and QC (Trimmomatic) Report](#seqSums) * [Host Plant Alignment Report](#pAlign) * Fungal Alignments * [Alignment Conclusions](#aConc)
#HIDDEN
HTML(filename='html/FungalAlign.html')
#HIDDEN
# #HIDDEN
# import plotly.plotly as py
# fig = metadataG.groupby(['Date','plot_name']).sum()['percFungalAligned'].unstack().iplot(yTitle="Percentage of Reads",title="All Samples Perctages of Reads Aligning to Combined Fungal Assemblies",asFigure=True);
# py.offline.plot(fig,filename="html/FungalAlign.html")
# @interact
# def fungalAlign(Samples=["Switchgrass", "Miscanthus", "Switchgrass & Miscanthus"]):
#     metadataG.groupby(['Date','plot_name']).sum()['percFungalAligned'].unstack().iplot(yTitle="Percentage of Reads",title="All Samples Perctages of Reads Aligning to Combined Fungal Assemblies")
#     if Samples == "Switchgrass": return G5Data.groupby(['Date','plot_name']).sum()['percFungalAligned'].unstack().iplot(yTitle="Percentage of Reads",title="Switchgrass Samples Perctages of Reads Aligning to Combined Fungal Assemblies");
#     elif Samples == "Miscanthus": return G6Data.groupby(['Date','plot_name']).sum()['percFungalAligned'].unstack().iplot(yTitle="Percentage of Reads",title="Miscanthus Samples Perctages of Reads Aligning to Combined Fungal Assemblies");
#     else: return metadataG.groupby(['Date','plot_name']).sum()['percFungalAligned'].unstack().iplot(yTitle="Percentage of Reads",title="All Samples Perctages of Reads Aligning to Combined Fungal Assemblies");

Sequencing and alignment Summaries

Sequencing Observations

What patterns are there over the season?
  1. The percentage of plant reads sequenced is high thoughout the season and tapers off towards the end of the season. This may be do to senescence of plant cells.
  2. Alignment to the 6 selected fungal assemblies is low, but picks up during the warmer months. Overall, reads aligning to the combined fungal assembly were <10% of the non-plant reads.

Annotation Summary

The annotations were performed using KEGG's prokaryotic peptides and eukaryotic peptides

Based on the high-overlap, we decided to use the mags instead of the assembly for alignment

MAG Stats

checkm lineage_wf -t 20 -x fa Final.contigs.fa.metabat-bins40 metaBinsStats

#HIDDEN
df = read_csv("mags/MAG_Stats.tsv",sep="\t")
df = df.head(100)
df.to_html("html/MAG_Stats.html")
# qgrid_widgetMAGs = qgrid.show_grid(df)
# qgrid_widgetMAGs
HTML(filename="html/MAG_Stats.html")
Bin Id Marker lineage Unnamed: 2 # Genomes # Markers # Marker sets 0 1 2 3 4 5+ Completeness Contamination Strain heterogeneity
0 bin.7 o__Burkholderiales (UID4105) 54 553 264 3 534 16 0 0 0 99.66 2.65 43.75
1 bin.39 f__Enterobacteriaceae (UID5054) 223 874 303 38 829 7 0 0 0 99.05 0.70 71.43
2 bin.67 o__Pseudomonadales (UID4488) 185 812 308 35 751 26 0 0 0 97.79 3.53 46.15
3 bin.157 o__Actinomycetales (UID1593) 69 400 198 21 364 14 1 0 0 96.84 4.73 29.41
4 bin.125 o__Actinomycetales (UID1663) 488 310 185 18 276 16 0 0 0 93.96 4.23 68.75
5 bin.164 o__Burkholderiales (UID4000) 193 427 214 44 379 4 0 0 0 92.27 1.64 50.00
6 bin.117 o__Rhizobiales (UID3654) 92 481 319 66 401 14 0 0 0 90.51 3.00 78.57
7 bin.202 o__Cytophagales (UID2936) 47 454 336 82 362 10 0 0 0 89.90 2.68 80.00
8 bin.181 o__Rhizobiales (UID3654) 92 481 319 73 381 27 0 0 0 89.84 5.71 33.33
9 bin.188 o__Burkholderiales (UID4000) 193 427 214 77 296 51 3 0 0 87.56 15.48 60.00
10 bin.198 o__Cytophagales (UID2936) 47 454 336 54 379 21 0 0 0 87.28 3.43 61.90
11 bin.121 o__Burkholderiales (UID4000) 193 427 214 68 339 20 0 0 0 87.25 5.02 30.00
12 bin.95 o__Burkholderiales (UID4002) 107 574 251 67 493 14 0 0 0 87.18 2.41 78.57
13 bin.214 o__Flavobacteriales (UID2815) 123 324 204 67 252 5 0 0 0 87.11 1.49 100.00
14 bin.195 k__Bacteria (UID203) 5449 104 58 25 54 16 8 1 0 82.99 35.50 78.26
15 bin.105 o__Actinomycetales (UID1593) 69 400 198 104 282 14 0 0 0 75.84 4.46 28.57
16 bin.88 k__Bacteria (UID203) 5449 104 58 28 39 22 10 5 0 73.82 38.55 34.15
17 bin.142 c__Deltaproteobacteria (UID3216) 83 247 155 49 194 4 0 0 0 73.12 2.58 25.00
18 bin.148 o__Sphingomonadales (UID3310) 26 569 293 154 398 17 0 0 0 72.70 2.26 17.65
19 bin.112 o__Flavobacteriales (UID2815) 123 324 204 100 214 10 0 0 0 71.14 2.71 80.00
20 bin.91 o__Sphingomonadales (UID3310) 26 569 293 191 261 99 18 0 0 69.76 25.37 22.22
21 bin.206 o__Rickettsiales (UID3809) 83 324 211 86 226 12 0 0 0 69.27 3.82 58.33
22 bin.93 o__Actinomycetales (UID1802) 274 385 212 105 254 25 1 0 0 68.43 8.09 53.57
23 bin.114 o__Actinomycetales (UID1663) 488 310 185 92 215 3 0 0 0 67.03 0.69 33.33
24 bin.104 k__Bacteria (UID203) 5449 103 57 52 38 12 1 0 0 67.01 12.41 46.67
25 bin.134 o__Rhizobiales (UID3654) 92 481 319 183 288 10 0 0 0 66.91 2.00 80.00
26 bin.163 o__Rickettsiales (UID3809) 83 324 211 101 216 7 0 0 0 66.80 2.53 14.29
27 bin.146 o__Actinomycetales (UID1697) 387 330 193 131 190 9 0 0 0 64.35 2.27 0.00
28 bin.205 o__Actinomycetales (UID1697) 387 330 193 125 198 7 0 0 0 63.48 3.11 42.86
29 bin.200 k__Bacteria (UID203) 5449 104 58 59 43 2 0 0 0 62.47 2.59 100.00
30 bin.57 o__Cytophagales (UID2936) 47 454 336 192 256 6 0 0 0 60.73 1.79 50.00
31 bin.10 k__Bacteria (UID203) 5449 104 58 63 41 0 0 0 0 60.69 0.00 0.00
32 bin.208 c__Alphaproteobacteria (UID3305) 564 337 221 134 193 10 0 0 0 60.38 2.96 90.00
33 bin.2 o__Actinomycetales (UID1815) 120 574 266 221 331 22 0 0 0 59.37 3.03 36.36
34 bin.175 o__Burkholderiales (UID4000) 193 427 214 171 246 8 2 0 0 58.90 2.35 50.00
35 bin.116 k__Bacteria (UID203) 5449 104 58 61 31 11 0 1 0 58.79 18.97 100.00
36 bin.66 o__Sphingomonadales (UID3310) 26 569 293 263 287 17 2 0 0 56.39 5.49 52.17
37 bin.139 o__Actinomycetales (UID1593) 69 400 198 165 203 31 0 1 0 56.31 7.46 64.86
38 bin.156 k__Bacteria (UID203) 5449 104 58 60 39 5 0 0 0 56.04 6.19 100.00
39 bin.55 k__Bacteria (UID203) 5449 103 57 65 28 9 1 0 0 56.01 13.16 66.67
40 bin.84 o__Burkholderiales (UID4000) 193 427 214 212 206 9 0 0 0 55.53 2.95 11.11
41 bin.171 k__Bacteria (UID203) 5449 104 58 35 31 24 14 0 0 55.22 22.74 3.03
42 bin.132 f__Rhizobiaceae (UID3564) 78 840 354 352 470 18 0 0 0 52.64 1.16 61.11
43 bin.54 k__Bacteria (UID203) 5449 103 57 69 34 0 0 0 0 52.63 0.00 0.00
44 bin.199 k__Bacteria (UID203) 5449 104 58 65 37 2 0 0 0 52.47 3.45 50.00
45 bin.115 k__Bacteria (UID203) 5449 104 58 43 38 12 11 0 0 52.30 19.17 66.67
46 bin.120 o__Sphingomonadales (UID3310) 26 569 293 274 271 23 1 0 0 51.10 5.80 38.46
47 bin.33 k__Bacteria (UID203) 5449 104 58 68 36 0 0 0 0 50.17 0.00 0.00
48 bin.65 k__Bacteria (UID203) 5449 104 58 64 39 1 0 0 0 49.32 1.72 100.00
49 bin.48 k__Bacteria (UID203) 5449 103 57 65 34 4 0 0 0 47.89 5.26 25.00
50 bin.62 k__Bacteria (UID203) 5449 104 58 70 32 2 0 0 0 47.40 2.59 50.00
51 bin.74 k__Bacteria (UID203) 5449 104 58 67 37 0 0 0 0 47.35 0.00 0.00
52 bin.212 p__Bacteroidetes (UID2605) 350 315 209 178 136 1 0 0 0 47.05 0.48 100.00
53 bin.108 p__Bacteroidetes (UID2605) 350 316 210 145 157 14 0 0 0 46.93 3.48 85.71
54 bin.159 o__Sphingomonadales (UID3310) 26 569 293 304 226 36 3 0 0 46.71 8.76 20.00
55 bin.25 o__Pseudomonadales (UID4488) 185 813 308 381 418 14 0 0 0 46.33 1.64 71.43
56 bin.77 k__Bacteria (UID203) 5449 104 58 40 39 14 11 0 0 45.92 12.97 0.00
57 bin.133 k__Bacteria (UID203) 5449 103 57 72 29 2 0 0 0 45.67 2.63 50.00
58 bin.136 o__Actinomycetales (UID1697) 387 330 193 159 164 7 0 0 0 45.65 1.73 14.29
59 bin.30 o__Rhizobiales (UID3654) 92 481 319 270 203 8 0 0 0 45.35 2.10 25.00
60 bin.173 p__Bacteroidetes (UID2605) 350 315 209 171 132 12 0 0 0 45.11 2.62 58.33
61 bin.78 k__Bacteria (UID203) 5449 103 57 72 24 6 1 0 0 42.98 9.94 44.44
62 bin.23 k__Bacteria (UID203) 5449 104 58 76 24 4 0 0 0 42.24 6.90 25.00
63 bin.203 k__Bacteria (UID203) 5449 104 58 75 28 1 0 0 0 42.24 1.72 100.00
64 bin.89 k__Bacteria (UID203) 5449 104 58 49 33 15 5 2 0 41.98 13.85 11.90
65 bin.87 k__Bacteria (UID203) 5449 104 58 76 26 2 0 0 0 40.86 2.59 100.00
66 bin.80 k__Bacteria (UID203) 5449 104 58 76 27 1 0 0 0 40.52 1.72 100.00
67 bin.98 k__Bacteria (UID203) 5449 103 57 75 26 2 0 0 0 40.35 3.51 100.00
68 bin.44 k__Bacteria (UID203) 5449 103 57 73 29 1 0 0 0 37.38 1.75 100.00
69 bin.102 c__Gammaproteobacteria (UID4202) 67 481 276 289 190 2 0 0 0 37.23 0.37 100.00
70 bin.111 k__Bacteria (UID203) 5449 104 58 79 24 1 0 0 0 34.66 1.72 0.00
71 bin.5 o__Rhizobiales (UID3447) 356 414 248 269 138 7 0 0 0 34.17 1.69 0.00
72 bin.68 k__Bacteria (UID203) 5449 104 58 60 25 15 4 0 0 33.04 9.46 3.70
73 bin.211 k__Bacteria (UID203) 5449 103 57 80 23 0 0 0 0 32.62 0.00 0.00
74 bin.213 k__Bacteria (UID203) 5449 103 57 84 10 7 2 0 0 32.46 19.30 61.54
75 bin.182 k__Bacteria (UID203) 5449 104 58 78 25 1 0 0 0 32.01 1.72 100.00
76 bin.131 k__Bacteria (UID203) 5449 104 58 76 24 4 0 0 0 31.82 6.90 25.00
77 bin.81 k__Bacteria (UID203) 5449 103 57 82 21 0 0 0 0 31.58 0.00 0.00
78 bin.90 k__Archaea (UID2) 207 149 107 99 41 9 0 0 0 31.10 6.54 22.22
79 bin.96 k__Bacteria (UID203) 5449 104 58 85 18 1 0 0 0 31.03 1.72 100.00
80 bin.13 k__Bacteria (UID203) 5449 104 58 61 25 12 4 2 0 30.22 13.03 41.67
81 bin.14 k__Bacteria (UID203) 5449 104 58 83 21 0 0 0 0 30.00 0.00 0.00
82 bin.85 k__Bacteria (UID203) 5449 104 58 84 20 0 0 0 0 29.31 0.00 0.00
83 bin.12 k__Bacteria (UID203) 5449 104 58 59 31 13 1 0 0 28.91 8.46 6.25
84 bin.141 o__Actinomycetales (UID1815) 120 574 266 399 174 1 0 0 0 28.71 0.38 0.00
85 bin.168 k__Bacteria (UID203) 5449 104 58 86 17 1 0 0 0 28.45 1.72 100.00
86 bin.53 k__Bacteria (UID203) 5449 104 58 82 22 0 0 0 0 28.06 0.00 0.00
87 bin.170 k__Bacteria (UID203) 5449 104 58 81 23 0 0 0 0 27.35 0.00 0.00
88 bin.21 o__Pseudomonadales (UID4488) 185 802 303 527 273 2 0 0 0 25.75 0.36 100.00
89 bin.140 k__Bacteria (UID203) 5449 104 58 86 16 2 0 0 0 25.34 3.45 100.00
90 bin.58 k__Bacteria (UID203) 5449 103 57 86 16 1 0 0 0 24.72 1.75 100.00
91 bin.127 k__Bacteria (UID203) 5449 104 58 83 21 0 0 0 0 24.25 0.00 0.00
92 bin.149 k__Bacteria (UID203) 5449 104 58 75 28 1 0 0 0 23.97 0.86 100.00
93 bin.128 k__Bacteria (UID203) 5449 104 58 86 18 0 0 0 0 23.43 0.00 0.00
94 bin.41 k__Archaea (UID2) 207 149 107 107 31 9 2 0 0 22.53 3.39 0.00
95 bin.144 o__Actinomycetales (UID1697) 387 330 193 253 73 4 0 0 0 21.98 1.18 0.00
96 bin.52 k__Bacteria (UID203) 5449 104 58 86 18 0 0 0 0 21.47 0.00 0.00
97 bin.154 k__Bacteria (UID203) 5449 103 57 88 14 1 0 0 0 21.37 0.16 100.00
98 bin.190 k__Bacteria (UID203) 5449 104 58 92 12 0 0 0 0 20.69 0.00 0.00
99 bin.124 o__Pseudomonadales (UID4488) 185 813 308 643 168 2 0 0 0 19.72 0.24 50.00

Overarching analysis - changes in metagenome content over time

All bins with at least 50% completeness were used in a combined assembly. The analysis below comes from the counts for each sample. These counts were normalized by the number of reads that did not align to the fungal or host assemblies.

[Outline](#home) Skip to: * Functional annotation accumulation over time * [Is time a factor that affects the functional richness of phylosphere communities (Alpha Diversity)?](#alphaD) * [Do we see changes over time in functional richness?](#funcRich) * [Is there a difference between treatment (Fertilized vs. Unfertilized)?](#treatComp) * [Is there a difference between host plant?](#plantDif) * [PCoA of functional counts](#pcoa) * [Is there similarity between the WGS communities and the 16S data?](#comp16S) * [How variable are metagenomes across replicate time points?](#permdisp)

COG accumulation over time

pcoa

Is time a factor that affects the functional richness of phylosphere communities (Alpha Diversity)?

Switchgrass Pearson's product-moment correlation
Data: Time and Richness
t = 10.223 df = 62 p-value = 6.316e-15
sample estimates: cor 0.7922484
Switchgrass Pearson's product-moment correlation
Data: Time and Shannon
t = 7.1534 df = 62 p-value = 1.166e-09
sample estimates: cor 0.6724267
Switchgrass Pearson's product-moment correlation
Data: Time and Pielou
t = 5.6951 df = 62 p-value = 3.622e-07
sample estimates: cor 0.586056
Miscanthus Pearson's product-moment correlation
Data: Time and Richness
t = 8.5659 df = 70 p-value = 1.658e-12
sample estimates: cor 0.7153784
Miscanthus Pearson's product-moment correlation
Data: Time and Shannon
t = 6.1267 df = 70 p-value = 4.675e-08
sample estimates: cor 0.5908107
Miscanthus Pearson's product-moment correlation
Data: Time and Pielou
t = 7.1602 df = 70 p-value = 6.369e-10
sample estimates: cor 0.6502079

Do we see changes over time in functional richness?

Call: adonis(formula = dist.otu ~ map_16S\$time_numeric)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$time_numeric 1 1.9356 1.93562 47.814 0.26298 0.001 ***
Residuals 134 5.4246 0.04048 0.73702
Total 135 7.3603 1.000000

images/Level1_Mags.png

images/Level2_Mags.png

Is there a difference between treatment (Fertilized vs. Unfertilized)?

Call: adonis(formula = dist.otu ~ map_16S\$treatment)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$treatment 1 0.0289 0.028930 0.52878 0.00393 0.755
Residuals 134 7.3313 0.054711 0.99607
Total 135 7.3603 1.000000

Is there a difference between host plant?

Call: adonis(formula = dist.otu ~ map_16S\$plant)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$plant 1 0.1617 0.161704 3.0101 0.02197 0.027 *
Residuals 134 7.1986 0.053721 0.97803
Total 135 7.3603 1.000000

PCoA

Is there similarity between the WGS communities and the 16S data (mantel/procrustes)?

Mantel statistic based on Pearson's product-moment correlation Call: mantel(xdis = genes.dist, ydis = otu.dist) Mantel statistic r: 0.6526 Significance: 0.001 Upper quantiles of permutations (null model): 90% 95% 97.5% 99% 0.0833 0.1194 0.1458 0.1849 Permutation: free Number of permutations: 999 Call: protest(X = genes.pcoa, Y = otu.pcoa) Procrustes Sum of Squares (m12 squared): 0.5402 Correlation in a symmetric Procrustes rotation: 0.6781 Significance: 0.001 Permutation: free Number of permutations: 999

MetaT Kallisto

Link to the interactive tables. Static tables shown below.

Alignment of the MetaT reads to the MAG assembly

MetaT alignment by count

MetaT alignment by percentage

Next Steps

  1. Fix the metadata to ensure that each sample is accounted for and we have the seq files (Nejc)      ✓ Completed Wednesday, Feb 19,2020. Delivered by Kiera
  2. Modify the R analysis script to read results from the MetaT data (Nejc & Shane)

    In order for this to progress, 1 had to be complete. The next step is figure out why the metadata for the MetaT is not meshing with the R script.

  3. Explore the Kallisto output to determine how to extract annotations (Shane)

    Kallisto has an arguement to export a bam file when it runs, but I do not know how these bam files work or if they are the same thing as a bam from bowtie2. I am working with kallisto to get a bam file now and then I will see if that file works to extract annotations from the contigs